As you may already know, child obesity is a serious concern in most western counties and paticularly in the U.K. The U.K government released a report stating that 1 in 5 children in reception year (4-5 year olds) were obese. As children grow older, the rates of obesity increases. By year 6 (10-11 year olds), over a third of children were classed as overweight or obese. The obesity figures are around 20%.
Childhood obesity is more prevalent in London than England overall. In 2018/19, some 23.2% of children in Year 6 were considered obese in London, compared to 20.2% in England. - Trust for London
For this project, I will be looking at the boroughs of Greater London in the U.K. I will be aiming to answer the question:
I will be doing this for public health UK to see if the local authorities should limit number of business license’s it gives or to promote to certain types of venues in order to improve child health in their boroughs.
There are 32 boroughs in Greater London with a population of 8.92 million people.
If I can prove a correlation between average income, child obesity levels and the number of unhealthy venues in a borough:
This will allow the local councils and health services to improve the health of the children in its borough and improve the future health and wellbeing of the children, thus saving the councils and health services vast sums of money by tackling childhood obesity at an early stage before any of future long term health and employment issues start to have a serious impact on their wellbeing.
First we will need some economic data about the average income for each Greater London borough. I got the data from https://data.london.gov.uk/dataset/earnings-place-residence-borough . This file contains the weekly average income per borough from the years 2002 to 2019. We are only interested in the most recent data, the 2019 data. I downloaded a xlsx file on to my local storage for convenience.
As you can see the data contains a lot of junk, lets clean up the data. We only need the columns 'Area', which contains the name of the boroughs and '2019', which contains the avarage weekly income.
I got the childhood obesity data from https://www.trustforlondon.org.uk/data/child-obesity/ I downloaded a csv of the data on to my local storage for convience. Lets have a look at the uncleaned data
Now I will clean the data. We are only inetested in the 'Area' and 'Proportion of obese children in Year 6 (2018/19)' columns.
Now I will merge the 'income' and 'obesity' dataframes into a new dataframe called 'lon_in_ob'
First I will plot some Choropleth maps to show the income and childhood obesity levels for each of the 32 Greater London boroughs.
I got the geoJSON data for the boundary coordinates for the Greater London boroughs from https://skgrange.github.io/www/data/london_boroughs.json and dowloaded it to local storage.
Now I will create 2 maps.
As you can see from the above maps there seems to be some correlation between income and childhood obesity.
If you notice the 2 boroughs with the lowest proportion of year 6 children that are obese (Kingston upon Thames and Richmond upon Thames) are 2 of the wealthiest boroughs based on average weekly income.
Also notice that the 3 poorest boroughs (Enfield, Barking, Newham and Barking and Dagenham) are also the 3 boroughs with the highest proportion of obese year 6 children. (see table below)
NB: There is no data available for the 'city of London' as is not a London borough (the black region at the centre of the maps)
These 2 boroughs have significantly lower than average Proportion of obese year 6 children in Greater london boroughs and a significantly higher than average weekly income.
Where as these 2 boroughs have a higher than average Proportion of obese year 6 children in Greater london boroughs and a significantly lower than average weekly income.
From the boxplot you can see that there are 2 outliers, namely Kingston upon Thames and Richmond upon Thames which are more than 1.5 times below the interquartile range. 50% of the boroughs have between 21% and 24.5% proportion of year 6 children that are obese in their boroughs.
There are no outliers in the Average weekly Income. There is quite a big spread between the minimum and maximum Average weekly Income, with 50% of boroughs have an average weekly income of between £545 and £645.
From the plot above we can make the following deductions:
This means that if a child lives in a borough which has lower than the mean average weekly income, they are 15/5 = 3 times more likely to be obese than not.
If the child lives in a borough which has higher than the mean average weekly income, they are 6/6 = 1 times more likely to be obese than not, i.e. the child is just as likely to be obese as they are not.
20 out of 32 boroughs have an average income less than the mean average. Out of these, 15 have higher proportions of obese children than the mean average, that is to say 75% of them.
12 out of 32 boroughs have an average income higher than the mean average. Out of these, 6 have higher proportions of obese children than the mean average, that is to say 50% of them.
Notice that there seems to be a quite good correlation between income and childhood obesity when the income is lower than the mean average. Here 10 boroughs fall within the 95% zone of probability and fit quite closely to the regression line. Whereas when the average income is higher than the mean average, only 3 boroughs are within the 95% zone.
Looking at the spread of points, it is unlikely that I will be able to find a polynomial regression model that wouldn't overfit the data.
From the data we can see that there is some correlation between the average weekly income of a borough and the proportion of obese year 6 children in that borough. This correlation is more obvious in the boroughs that have a lower average weekly income. In the boroughs that have a higher than average weekly income, there seems to be little correlation, with the same number of boroughs that have a higher than mean proportion of obese children as there are boroughs that have a lower than mean proportion of obese children.
At the extremes there is correlation, Barking and Dagenham which is the poorest borough and has the highest proportion of childhood obesity.
At the other extreme there is Richmond upon Thames, which is the second wealthiest borough and the lowest proportion of childhood obesity.
But then you have the borough of Kensington and Chelsea which has the highest average weekly income, but also one of the highest proportion of childhood obesity.
There doesn’t seem to very much correlation when we look at the boroughs that have an average weekly income of between £525 and £725, which is where most of the boroughs are. Here for example the boroughs of Wandsworth and Barnet have significantly different average weekly income but similar proportions of childhood obesity.
Now lets examine if there is link between the venues and the proportion of childhood obesity in the borough. For this I will be using the Foursquare API to pull venue data from greater London postcode coordniates.
Postcodes in the UK are compromised of Postcode Area + Postcode District
Postcode Area – this is the largest geographical unit of the postcode. Each one comprises one or two alpha characters generally chosen to be a mnemonic of the area eg MK for Milton Keynes, SO for Southampton. There are currently 124 Postcode areas including Guernsey (GY) Jersey (JE) and the Isle of Man (IM)
Postcode District – Each postcode area is divided into a number of districts which are represented by the numerical portion of each part of the postcode. These numbers range from 0 to 99 eg MK42. In London a further alpha character is used to divide some districts into sub divisions eg EC1A.
First lets the list of Postcode Area codes for Greater London. I will get this data from https://www.robertsharp.co.uk/2017/08/09/a-table-that-shows-the-uk-region-for-all-postcode-districts/
Lets clean up the data and obtain the data we need.
We are only interested in the Postcode prefix for Greater London post codes.
Now we have to add Postcode District to the Postcode Area prefixes. All UK post Codes have prefixes in the range from 0 to 99. lets generate them.
N.B Not all postcodes will have 99 Postcode District's this is just a dataframe of all possible Greater London postcodes
Now lets get the geographical coordinated for uk postcodes. I have got them from https://www.freemaptools.com/download/full-postcodes/ukpostcodes.zip and downloaded to local storage for convenience.
First I will examine if the number 'Food and 'Athletics & Sports' venues in an area have any corralation to the levels of childhood obesity in the borough.
I will use the Foursquare API to get 50 'Food and 'Athletics & Sports' venues with in a radius of 1000m for every postcode in the Greater London Area.
In the Food catergory, I will only be looking for the sub catergories that are more likely to be linked to childhood obesity.
The sub catergroies I will be looking at are : 'Bakery', 'Burger Joint', 'Dessert Shop', 'Donut Shop', 'Fast Food Restaurant', 'Fish & Chips Shop', 'Fried Chicken Joint', 'Pizza Place', 'Snack Place' and 'Wings Joint'
In the 'Athletics & Sports' catergory, I will only be looking for the sub catergories that are more likely to be used by children.
The sub catergroies I will be looking at are : 'Badminton Court', 'Basketball Court', 'Boxing Gym', 'Gym Pool', 'Gymnastics Gym', 'Martial Arts Dojo', 'Track','Skate Park', 'Soccer Field', 'Tennis Court', 'Volleyball Court', 'Indoor Play Area', 'Park,Playground and Recreation Center'
I will use the Foursquare API and catergory codes from the Foursquare website to only search for venues that I believe have the biggest influence on childhood obesity.
Show the first 5 rows of the dataframe "london_venues" which contains the data we got from the Foursquare request
Notice that we retrieved venues for 283 post codes not 287 post codes that we expected. This is because 4 post codes had no 'Unhealthy Food' and 'Athletics & Sports'** venues that we are interested in within a radius of 1000m
Notice also that we retrieved only 8325 venues and not 50 venues for every post code This is because some of the post codes are in area's with very few local 'Food' and 'Athletics & Sports' venues such as residential areas or industrial and business area's.
See bar graph below.
As you can see, Parks are the most frequent venue in followed by Pizza Places and Fast Food Restaurants. Playgrounds, Tennis Courts and Soccer Fields are the 6th, 10th and 11th most frequent venues.
In the top 20 there are 7650 venues, out of these 2240 are Athletics & Sports venues (29.28%) and 5410 are Unhealthy food venues (70.71)
So just over 2 thirds venues are Unhealthy food venues and just under 1 third are Athletics & Sports venues.
Greater London doesn’t seem like a very healthy city for Children.
You can see that:
In this cluster we can see that 2 out of the top 10 most frequent venues are Athletic & sports related and 8 out of the top 10 are Unhealthy food related
Fast Food Restaurants are most frequent venue in this cluster accounting for 17.94% of the venues in this cluster.
If we add up all the Unhealthy Food venues we get 62.48%, so we can deduce that at least 62.48% of the venues within 1000m of the post codes are Unhealthy Food venues as we are only looking at the top 10 and not the whole cluster.
If we add all the Athletic & sports related venues up we get 16.81%, so we can deduce that at least 16.81% of the venues within 1000m of the post codes are Athletic & sports venues that are suitable for children as we are only looking at the top 10 and not the whole cluster.
There seems to be less than a third of the number of sports venues and there does seem to be a lot of Unhealthy food venues in this cluster.
In this cluster we can see that 6 out of the top 10 most frequent venues are Unhealthy Food related and 4 out of the top 10 are Athletic & sports related
Parks are the most frequent venue in this cluster accounting for 29.09% of the venues in this cluster.
If we add up all the unhealthy Unhealthy Food venues we get 42.22%, so we can deduce that at least 42.22% of the venues within 1000m of the post codes are Unhealthy Food venues as we are only looking at the top 10 and not the whole cluster.
If we add all the Athletic & sports related venues up we get 41.67%, so we can deduce that at least 41.67% of the venues within 1000m of the post codes are Athletic & sports venues that are suitable for children as we are only looking at the top 10 and not the whole cluster.
There seems to be roughly as many sport venues for children as there are unhealthy food venues in these postcode which.
In this cluster we can see that 7 out of the top 10 most frequent venues are Unhealthy food related and 3 out of the top 10 are Athletic & sports related.
Pizza Places most frequent venues in this cluster accounting for 16.08% of all the venues in this cluster.
If we add up all the Unhealthy Food venues we get 58.17%, so we can deduce that at least 58.17% of the venues within 1000m of the post codes are Unhealthy food venues as we are only looking at the top 10 and not the whole cluster.
If we add all the Athletic & sports related venues up we get 21.97%, so we can deduce that at least 21.97% of the venues within 1000m of the post codes are Athletic & sports venues that are suitable for children as we are only looking at the top 10 and not the whole cluster.
There seems to be less than half of the number of sports venues and there does Unhealty food venues in this cluster.
We would have expected the 2 boroughs with the lowest proportion of childhood obesity (Kingston upon Thames and Richmond upon Thames) to have the most post codes that are in the 'Healthy' Cluster (Red Marker), but the 2 boroughs contain between them:
And we would have expected that the 3 boroughs with the highest proportion of childhood obesity (Enfield, Barking, Newham and Barking and Dagenham) would be contain mainly 'Unhealthy' and 'Moderately Unhealthy' postcodes (Cyan Markers), we find between them they contain:
As you can see there are indeed more ‘Unhealthy’ markers in these boroughs, but surprisingly there are more 'Healthy' markers than in the 2 boroughs with the least obesity levels.
You can also see that in the south London boroughs and the central London boroughs, there are a high concentration of red markers, but not a significantly lower proportion of obese children. In fact, these boroughs have similar obesity levels as the boroughs in North West London which contain very few 'healthy' Markers.
Here again we see very little correlation between the average weekly income of a borough and the number of 'Healthy' markers.
We can see that 2 boroughs with some of the highest weekly income (Kensington & Chelsea and Hammersmith & Fulham) have no healthy markers.
Whereas some of the boroughs in south and south west London (Croydon and Sutton) have relatively low average income, but have quite a high proportion of 'Healthy' markers.
Form this study we can draw a few conclusions:
We would have expected the boroughs with the highest childhood obesity levels to have the highest proportion of unhealthy food venues. This didn’t turn out to be the case, with the unhealthy markers quite evenly spread.
Conversely, we would have expected the boroughs with the highest proportion of childhood obesity to have the highest proportion of Unhealthy markers, but this wasn't the case.
So there must be some other factors which increase the likely hood of childhood obesity in the poorer boroughs, not just the food and sports venues in these boroughs. This could be many things like, housing, the parent’s educational levels, ethnic makeup of the boroughs or the standard of schools in the borough.
I can conclude that solving the childhood obesity problem in Greater London is not a simple solution and requires looking at the problem from many different angles and that there are many factors involved, not just income.